60 research outputs found

    Improving accuracy and efficiency of blind protein-ligand docking by focusing on predicted binding sites

    Get PDF
    The use of predicted binding sites (binding sites calculated from the protein structure alone) is evaluated here as a tool to focus the docking of small molecule ligands into protein structures, simulating cases where the real binding sites are unknown. The resulting approach consists of a few independent docking runs carried out on small boxes, centered on the predicted binding sites, as opposed to one larger blind docking run that covers the complete protein structure. The focused and blind approaches were compared using a set of 77 known protein-ligand complexes and 19 ligand-free structures. The focused approach is shown to: (1) identify the correct binding site more frequently than blind docking; (2) produce more accurate docking poses for the ligand; (3) require less computational time. Additionally, the results show that very few real binding sites are missed in spite of focusing on only three predicted binding sites per target protein. Overall the results indicate that, by improving the sampling in regions that are likely to correspond to binding sites, the focused docking approach increases accuracy and efficiency of protein ligand docking for those cases where the ligand-binding site is unknown. This is especially relevant in applications such as reverse virtual screening and structure-based functional annotation of proteins

    Interaction-based discovery of functionally important genes in cancers

    Get PDF
    A major challenge in cancer genomics is uncovering genes with an active role in tumorigenesis from a potentially large pool of mutated genes across patient samples. Here we focus on the interactions that proteins make with nucleic acids, small molecules, ions and peptides, and show that residues within proteins that are involved in these interactions are more frequently affected by mutations observed in large-scale cancer genomic data than are other residues. We leverage this observation to predict genes that play a functionally important role in cancers by introducing a computational pipeline (http://canbind.princeton.edu) for mapping large-scale cancer exome data across patients onto protein structures, and automatically extracting proteins with an enriched number of mutations affecting their nucleic acid, small molecule, ion or peptide binding sites. Using this computational approach, we show that many previously known genes implicated in cancers are enriched in mutations within the binding sites of their encoded proteins. By focusing on functionally relevant portions of proteinsā€”specifically those known to be involved in molecular interactionsā€”our approach is particularly well suited to detect infrequent mutations that may nonetheless be important in cancer, and should aid in expanding our functional understanding of the genomic landscape of cancer

    EASYMIFS and SITEHOUND: a toolkit for the identification of ligand-binding sites in protein structures

    Get PDF
    Summary: SITEHOUND uses Molecular Interaction Fields (MIFs) produced by EASYMIFS to identify protein structure regions that show a high propensity for interaction with ligands. The type of binding site identified depends on the probe atom used in the MIF calculation. The input to EASYMIFS is a PDB file of a protein structure; the output MIF serves as input to SITEHOUND, which in turn produces a list of putative binding sites. Extensive testing of SITEHOUND for the detection of binding sites for drug-like molecules and phosphorylated ligands has been carried out. Availability: EASYMIFS and SITEHOUND executables for Linux, Mac OS X, and MS Windows operating systems are freely available for download fromhttp://sitehound.sanchezlab.org/download.html

    Beyond structural genomics: computational approaches for the identification of ligand binding sites in protein structures

    Get PDF
    t Structural genomics projects have revealed structures for a large number of proteins of unknown function. Understanding the interactions between these proteins and their ligands would provide an initial step in their functional characterization. Binding site identification methods are a fast and cost-effective way to facilitate the characterization of functionally important protein regions. In this review we describe our recently developed methods for binding site identification in the context of existing methods. The advantage of energy-based approaches is emphasized, since they provide flexibility in the identifi- cation and characterization of different types of binding site

    Automated identification of binding sites forphosphorylated ligands in protein structures

    Get PDF
    Phosphorylation is a crucial step in many cellular processes, ranging from metabolic reactions involved in energy transformation to signaling cascades. In many instances, protein domains specifically recognize the phosphogroup. Knowledge of the binding site provides insights into the interaction, and it can also be exploited for therapeutic purposes. Previous studies have shown that proteins interacting with phosphogroups are highly heterogeneous, and no single property can be used to reliably identify the binding site. Here we present an energy-based computational procedure that exploits the protein three-dimensional structure to identify binding sites involved in the recognition of phosphogroups. The procedure is validated on three datasets containing more than 200 proteins binding to ATP, phosphopeptides, and phosphosugars. A comparison against other three generic binding site identification approaches shows higher accuracy values for our method, with a correct identification rate in the 80ā€“90% range for the top three predicted sites. Addition of conservation information further improves the performance. The method presented here can be used as a first step in functional annotation or to guide mutagenesis experiments and further studies such as molecular docking

    Building towards precision medicine: empowering medical professionals for the next revolution

    Get PDF
    A new paradigm in disease classification, diagnosis and treatment is rapidly approaching. Known as precision medicine, this new healthcare model incorporates and integrates genetic information, microbiome data, and information on patientsā€™ environment and lifestyle to better identify and classify disease processes, and to provide custom-tailored therapeutic solutions. In spite of its promises, precision medicine faces several challenges that need to be overcome to successfully implement this new healthcare model. In this paper we identify four main areas that require attention: data, tools and systems, regulations, and people. While there are important ongoing efforts for addressing the first three areas, we argue that the human factor needs to be taken into consideration as well. In particular, we discuss several studies that show how primary care physicians and clinicians in general feel underequipped to interpret genetic tests and direct-to-consumer genomic tests. Considering the importance of genetic information for precision medicine applications, this is a pressing issue that needs to be addressed. To increase the number of professionals with the necessary expertise to correctly interpret the genomics profiles of their patients, we propose several strategies that involve medical curriculum reforms, specialist training, and ongoing physician training

    A Machine Learning Approach for Predicting Patient Mortality with Heart Rate Variability Statistics

    Get PDF
    The prediction of patient mortality in the healthcare system provides a metric by which hospitals can better manage patient care and assess the needs of each individual patient. As such, the development of better predictive methods is vital for improving patient outcomes and overall quality of care. Heart rate variability (HRV) is a measure of the heartā€™s complex beating patterns, giving medical professionals additional insight into patient health. Previous research has demonstrated the potential use of heart rate variability as a metric for patient mortality prediction for various conditions, however more work is necessary to validate HRV as a metric for a broader and more diverse set of patients. This study uses data from 2664 patients within the MIMIC-III clinical database matched with patient electrocardiogram (ECG) data to link HRV data with later patient mortality, examining the efficacy of HRV as a biomarker for predicting patient mortality and investigating possible avenues for future integration of HRV into patient mortality predictive algorithms

    Systematic assessment of accuracy of comparative model of proteins belonging to different structural fold classes

    Get PDF
    In the absence of experimental structures, comparative modeling continues to be the chosen method for retrieving structural information on target proteins. However, models lack the accuracy of experimental structures. Alignment error and structural divergence (between target and template) influence model accuracy the most. Here, we examine the potential additional impact of backbone geometry, as our previous studies have suggested that the structural class (all-Ī±, Ī±Ī², all-Ī²) of a protein may influence the accuracy of its model. In the twilight zone (sequence identity ā‰¤ 30%) and at a similar level of target-template divergence, the accuracy of protein models does indeed follow the trend all-Ī± \u3e Ī±Ī² \u3e all-Ī². This is mainly because the alignment accuracy follows the same trend (all-Ī± \u3e Ī±Ī² \u3e all-Ī²), with backbone geometry playing only a minor role. Differences in the diversity of sequences belonging to different structural classes leads to the observed accuracy differences, thus enabling the accuracy of alignments/models to be estimated a priori in a class-dependent manner. This study provides a systematic description of and quantifies the structural class-dependent effect in comparative modeling. The study also suggests that datasets for large-scale sequence/structure analyses should have equal representations of different structural classes to avoid class-dependent bias
    • ā€¦
    corecore